Grounding of Textual Phrases in Images by Reconstruction

نویسندگان

  • Anna Rohrbach
  • Marcus Rohrbach
  • Ronghang Hu
  • Trevor Darrell
  • Bernt Schiele
چکیده

Grounding (i.e. localizing) arbitrary, free-form textual phrases in visual content is a challenging problem with many applications for human-computer interaction and image-text reference resolution. Although many data sources contain images which are described with sentences or phrases, they typically do not provide the spatial localization of the phrases. This is true for both curated datasets such as MSCOCO [23] or large user generated content as e.g. in the YFCC 100M dataset [31]. Consequently, being able to learn from this data without grounding supervision would allow large amount and variety of training data. For this setting we propose GroundeR, a novel approach, which is able to learn the grounding by aiming to reconstruct a given phrase using an attention mechanism. More specifically, during training time, the model encodes the phrase using an LSTM, and then has to learn to attend to the relevant image region in order to reconstruct the input phrase. At test time the correct attention, i.e. the grounding is evaluated. On the Flickr 30k Entities dataset [26] our approach outperforms prior work which, in contrast to us, trains with the grounding (bounding box) annotations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Attention-based Regression Model for Grounding Textual Phrases in Images

Grounding, or localizing, a textual phrase in an image is a challenging problem that is integral to visual language understanding. Previous approaches to this task typically make use of candidate region proposals, where end performance depends on that of the region proposal method and additional computational costs are incurred. In this paper, we treat grounding as a regression problem and prop...

متن کامل

Learning Unsupervised Visual Grounding Through Semantic Self-Supervision

Localizing natural language phrases in images is a challenging problem that requires joint understanding of both the textual and visual modalities. In the unsupervised setting, lack of supervisory signals exacerbate this difficulty. In this paper, we propose a novel framework for unsupervised visual grounding which uses concept learning as a proxy task to obtain self-supervision. The simple int...

متن کامل

Finding “It”: Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos

Grounding textual phrases in visual content with standalone image-sentence pairs is a challenging task. When we consider grounding in instructional videos, this problem becomes profoundly more complex: the latent temporal structure of instructional videos breaks independence assumptions and necessitates contextual understanding for resolving ambiguous visual-linguistic cues. Furthermore, dense ...

متن کامل

Grounding Visual Explanations (Extended Abstract)

Existing models [2] which generate textual explanations enforce task relevance through a discriminative term loss function, but such mechanisms only weakly constrain mentioned object parts to actually be present in the image. In this paper, a new model is proposed for generating explanations by utilizing localized grounding of constituent phrases in generated explanations to ensure image releva...

متن کامل

Conditional Image-Text Embedding Networks

This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016